Toward empirical correlations for estimating the specific heat capacity of nanofluids utilizing GRG, GP, GEP, and GMDH

When nanoparticles are dispersed and stabilized in a base-fluid, the resulting nanofluid undergoes considerable changes in its thermophysical properties, which can have a substantial influence on the performance of nanofluid-flow systems. With such necessity and importance, developing a set of mathematical correlations to identify these properties in various conditions can greatly eliminate costly and time-consuming experimental tests. Hence, the current study aims to develop innovative correlations for estimating the specific heat capacity of mono-nanofluids. The accurate estimation of this crucial property can result in the development of more efficient and effective thermal systems, such as heat exchangers, solar collectors, microchannel cooling systems, etc. In this regard, four powerful soft-computing techniques were considered, including Generalized Reduced Gradient (GRG), Genetic Programming (GP), Gene Expression Programming (GEP), and Group Method of Data Handling (GMDH). These techniques were implemented on 2084 experimental data-points, corresponding to ten different kinds of nanoparticles and six different kinds of base-fluids, collected from previous research sources. Eventually, four distinct correlations with high accuracy were provided, and their outputs were compared to three correlations that had previously been published by other researchers. These novel correlations are applicable to various oxide-based mono-nanofluids for a broad range of independent variable values. The superiority of newly developed correlations was proven through various statistical and graphical error analyses. The GMDH-based correlation revealed the best performance with an Average Absolute Percent Relative Error (AAPRE) of 2.4163% and a Coefficient of Determination (R2) of 0.9743. At last, a leverage statistical approach was employed to identify the GMDH technique’s application domain and outlier data, and also, a sensitivity analysis was carried out to clarify the degree of dependence between input and output variables.

The utilization of nanofluids for a variety of practical purposes is a popular and developing scientific field that is getting more attention day by day.Some of the most important applications of nanofluids, which can be observed in numerous research studies and publications, include the following: • heat exchangers with different configurations (such as shell and tube, finned tube, double pipe, etc.) 3
Until now, many analytical and numerical calculations have been performed on fluid-flow problems based on various kinds of nanofluids, mainly in research pertaining to the sectors of energy, electronics, and medicine.For instance, Jia et al. 12 carried out the numerical analysis of a glazed photovoltaic-thermal (PV/T) collector employing two kinds of coolant nanofluid (Al 2 O 3 /W and TiO 2 /W) with the aim of investigating its thermal and electrical performances.McCash et al. 13 mathematically conducted a comparative analysis for the peristaltic flow of two distinct nanofluids (Ag-Cu/W and Cu/W) inside an elliptical duct with sinusoidally advancing boundaries.Shahzad et al. 14 analyzed the electro-osmotic flow of a blood-based hybrid nanofluid consisting of single-and multi-walled carbon nanotubes (SWCNT and MWCNT) through a multiple-stenosed artery with permeable walls.Alghamdi et al. 15 mathematically modeled the two-dimensional magneto-hydrodynamics (MHD) flow based on the Cu-Al 2 O 3 /W hybrid nanofluid between two co-axial stretchable disks and numerically assessed it using the finite element method (FEM).Baig et al. 16 presented exact analytical solutions for the stagnation-point flow of water-dispersed MWCNTs passing across a heated stretching cylinder.
Nanofluids have exceptional properties compared to normal fluids, which makes them a preferred choice in the field of mass and heat transfer phenomena.Following the stabilization of nanoparticles in the base-fluid after mixing together, the thermophysical properties of the resultant nanofluid, such as viscosity (μ), thermal conductivity (k), specific heat capacity (C p ), and density (ρ), can be significantly altered.Several research efforts have been conducted on this topic in order to determine what other factors affect nanofluid properties and how these properties behave under different operational conditions.The present study only looks into changes in the specific heat capacity of mono-nanofluids (C p,nf ) as a function of the specific heat capacity of nanoparticles (C p,np ) and base-fluids (C p,bf ), temperature (T), particle volume fraction (ϕ v ), and average particle size (d np ).
Recently, many researchers have been attempting to establish the specific heat values of nanofluids in various ways, such as by performing experimental tests, applying machine learning techniques (black-box models), and developing empirical mathematical relationships (white-box models).In 2008, Vajjha and Das 17 released an empirical correlation for estimating the specific heat capacity of Al 2 O 3 /W+EG nanofluid as a function of temperature (313-363 K) and volume concentration (2-10 vol%).Their experimental measurements served as the basis for this cubic polynomial correlation.In addition, by comparing their measurements with estimated values achieved from the theoretical model of Pak and Cho 18 , they observed an acceptable level of agreement, with maximum and average deviations reported as 22% and 15%, respectively.In another investigation conducted in 2009, Vajjha and Das 19 created a specific heat correlation using the experimental data related to SiO 2 /W, ZnO/ W+EG, and Al 2 O 3 /W+EG nanofluids.Their correlation took into account four variables containing specific heat capacity of both nanoparticles and base-fluids, temperature, and particle volume concentration.This correlation set a defined range for each of the first two variables, and its constants for the three types of nanofluids were different.In a subsequent study by Vajjha and Das 20 in 2012, they updated the correlation proposed in reference 19 by incorporating the experimental data of CuO/W+EG nanofluid and making the temperature term dimensionless via entering a reference temperature.In 2013, Barbés et al. 21assessed the specific heat capacity of Al 2 O 3 /EG and Al 2 O 3 /W nanofluids as a function of temperature and volume fraction.The researchers also developed a simple linear regression model that suited their experimental data well.In the next year, they performed another similar research 22 on CuO/W and CuO/EG nanofluids.In 2015, Cabaleiro et al. 23 studied the specific heat changes of five distinct metal-oxide-based nanofluids at high concentrations.These nanofluids contained ZrO 2 , ZnO, and MgO in pure EG, as well as ZrO 2 and ZnO in the W+EG mixture (50:50 vol%).Finally, the researchers developed a specific heat correlation as a function of the nanoparticle and base-fluid specific heat capacities and particle volume fraction.In another research, Sekhar and Sharma 24 , while studying the specific heat capacity of Al 2 O 3 /W nanofluid at low concentrations (0.01-1 vol%), proposed a specific heat correlation in accordance with 81 experimental data-points of water-based Al 2 O 3 , SiO 2 , TiO 2 , and CuO nanofluids collected from other researches.This regression equation was valid in the particular range of nanofluid temperatures, nanoparticle diameters, and volume fractions.It was found that the calculated values and experimental data were well compatible with each other, such that a deviation range between −8% and +10% was obtained.Satti et al. 25 , after evaluating the specific heat capacities of five various nanofluids consisting of CuO, ZnO, SiO 2 , TiO 2 , and Al 2 O 3 nanoparticles suspended in a W+PG mixture (40:60 wt%), developed a correlation onto 610 measured data by employing the Minitab statistical software.Popa et al. 26 , according to their experimental measures on the specific heat capacity of CuO/W, Al 2 O 3 /W, and Al 2 O 3 /EG nanofluids, recalibrated the correlation developed by Vajjha and Das 19 and captured its new coefficients based on the analyzed nanofluids.The authors reported excellent agreement with a maximum Average Relative Error of nearly 2%.Moldoveanu and Minea 27 , based on their examinations measuring the specific heat capacities of TiO 2 /W, SiO 2 /W, and Al 2 O 3 /W nanofluids, modified Sekhar and Sharma's 24 equation to propose another correlation.Its validity was limited to a range of room temperatures and volume fractions of less than 5%.The average deviation of the specific heat values estimated via their correlation was reported to be roughly 11% when compared to the experimental results.In 2020, Çolak et al. 28 presented a correlation with an average error of -0.005% utilizing a database with 1287 data-points made up of volume fraction and temperature (as independent variables) and experimental specific heat values of Cu-Al 2 O 3 /W hybrid nanofluid.Gao et al. 29 proposed a correlation that was fitted by the experimental specific heat capacities related to the hybrid nanofluid of GrapheneOxide-Al 2 O 3 /W with mass fractions of 0.05-0.15wt% at a temperature range of 20-70 °C.
Since the specific heat capacity of nanofluids relies on various conditions and factors, it is often time-consuming, expensive, and difficult to quantify this property accurately via experimental procedures.To determine the specific heat capacity values of mono-nanofluids, robust machine-learning methods were employed in our recent study 30 , but the current work intends to provide novel empirical correlations.The literature review clearly indicates that the number of correlations developed so far by other researchers is limited.On the other hand, old correlations are only valid and suitable for certain nanofluids and only cover a restricted range of independent variables.As a result, there is still an opportunity for advancement in this subject, and it is conceivable to provide more comprehensive correlations with fewer constraints that can be applied to an extensive collection of nanofluids.
The current study's primary aim is to provide novel and precise mathematical correlations for estimating the specific heat capacity of mono-nanofluids.It should be noted that the considered mono-nanofluids have been selected only based on the oxide-based or non-metallic nanoparticles dissolved in conventional base-fluids.For this purpose, an extensive database consisting of experimental data taken from the literature is utilized (the same one involved in our previous study 30 ), and finally, four correlations are created using robust soft-computing procedures.To evaluate and compare the quality of the present correlations with prior ones suggested by others, various statistical and graphical error analyses are applied.The principal advantage of these new mathematical equations is that they have considerably mitigated the limitations of the earlier correlations.Therefore, they are applicable to many different oxide-based mono-nanofluids and a broad range of independent input variables.

Dataset specifications
The specific heat capacity of mono-nanofluids (C P,nf ) can be defined as a function of average particle size (d np ), particle volume fraction (ϕ v ), temperature (T), nanoparticle specific heat capacity (C P,np ), and base-fluid specific heat capacity (C P,bf ).Accordingly, 2084 data-points related to the experimental values of C P,nf were collected from diverse sources (19 references) available in the literature 17,19,[21][22][23]25,29,[31][32][33][34][35][36][37][38][39][40][41][42] . Before geneating the final flawless dataset, necessary corrective actions were carried out in the data pre-processing stage, such as data cleaning, data integration, and data reduction, to ensure the establishment of more accurate estimations.Table S1 (in the Supplementary File) lists the references used to extract the required data, the characteristics related to the various types of nanoparticles and base-fluids available in the final database, as well as the ranges of input (independent) and output (dependent) variables.Additionally, Table S2 (in the Supplementary File) provides the descriptive statistics for each of the variables included in the database.It should be mentioned that the information given in these two tables was also reported in our prior article 30 .Moreover, box-and-whisker plots and frequency distribution histograms related to the six variables available in the database were drawn in previous work 30 , which are avoided here.

Description of techniques
The modeling and prediction of complex or unknown systems' behavior employing input-output data is a common use of system identification techniques.Because of this, there is now much more interest in soft-computing techniques, which deal with data processing in uncertain and inexplicit environments.Many scientific studies have discussed the application of evolutionary algorithms as efficient soft-computing tools for system identification [43][44][45][46] .
Due to the data-driven nature of all soft-computing strategies, having more data leads to a more reliable and comprehensive model.In this regard, the present study covers four popular techniques for creating distinct mathematical models to estimate the specific heat capacities of mono-nanofluids.These techniques include Generalized Reduced Gradient (GRG), Genetic Programming (GP), Gene Expression Programming (GEP), and Group Method of Data Handling (GMDH), which are all covered with further details in the following subsections.
The specific algorithms adopted in this study were essentially chosen based on their suitability and ability to address the research goal, their compatibility with the quantity and quality of data-points contained within the dataset, and their own unique strengths and weaknesses.The successful usage and efficient performance of these techniques have been proven across a variety of investigations conducted in numerous fields of engineering sciences [47][48][49][50][51][52][53][54] .

A: Generalized Reduced Gradient (GRG)
One of the typical strategies used to solve multi-variable problems is the generalized reduced gradient (GRG).In this technique, the decision variable is the vector of X(x 1 ,x 2 ,…,x n ), and the constraints are the functions of g 1 ,g 2 ,…,g m .The objective function or any constraint function can be linear or non-linear.Additionally, it's possible that the ranges can be undefined.If no constraints exist, the problem is called an unconstrained optimization problem.It should be noticed that the lower and upper limits of the variables do not act as additional constraints, but are applied separately to the given program 55 .
The GRG technique uses the first-order partial derivatives of each function g i with respect to the variables x i , which are calculated by approximation of the forward or central finite difference.The program is executed according to the simulator's initial values.If the values provided by the simulator do not satisfy all constraints

C: Gene Expression Programming (GEP)
In 1999, Ferreira first presented the concept of Gene Expression Programming (GEP) as a strong soft-computing technique 64,65 .It is an advanced and upgraded version of GP and employs two major computational components in its structural computations for regression tasks, namely chromosomes (genotype) and expression trees (phenotype).However, the GP structure utilizes nonlinear forms of responses, namely parse trees.Actually, the chromosomes detect the selection or initial predictions for a particular problem, in which case using an interpretation strategy will allow the expression trees to provide more accurate solutions to the issues [66][67][68] .The simple flowchart related to this technique is shown in Fig. 2.
GEP, as a full-grown genotype/phenotype technique, develops computer programs encoded in fixed-length linear chromosomes.The structure of linear chromosomes makes possible the beneficial and unrestricted activity of crucial genetic operators like recombination, transposition, and mutation.Besides, GEP contributes a similar form of tree description and illustration to GP in order to retrace readily the stages accomplished by GP and discover effortlessly new boundaries established by exceeding the phenotype threshold 66 .In recent decades, GEP, as a preferred and recognized evolutionary approach for automatically creating computer programs, has developed and advanced quickly, such that a variety of advanced GEPs have been suggested for real-world applications that are proliferating rapidly 69 .

D: Group Method of Data Handling (GMDH)
A novel computational method for managing complex nonlinear computer tasks, known as the group method of data handling (GMDH), was first developed by Ivakhnenko. Figure 3 illustrates a functional flowchart for the GMDH technique.This technique can describe an explicit relationship between a mathematical model's inputs and its corresponding output 70,71 .This technique is one of the most satisfactory groups of artificial neural networks (ANNs) and is also known as a polynomial neural network (PNN).The inner layers of the GMDH technique contain a variety of independent nodes.This technique is provided in polynomial form, where all  www.nature.com/scientificreports/nodes in every layer are joined in couples through a quadratic polynomial and generate new polynomial-formed nodes in the subsequent layer, in accordance with the self-organizing principle 67,72 .
The GMDH technique is based on employing several nodes from the middle layers.Every node of GMDH returns a value by applying a quadratic polynomial approach that incorporates the preceding node 65,73 .As previously pointed out, the nodes of the next layers are produced using the quadratic polynomial functions that combine the nodes already present in the prior layers.The following formula represents the procedure of the GMDH technique 65,67,74 : in which x ij…k and y i respectively depict input and output variables of the model, m and w respectively indicate the size of layers and the number of inputs, as well as c and c ij…k stand for the polynomial factors.
By defining new nodal parameters (namely P i ) for the situation of two nodes connected using a quadratic polynomial approach, the below formula is presented 65 : The least-squares method (LSM) is then used to minimize the variance between model predictions and actual data, as shown below 65 : where N t and w are the size of the training subset and the number of variables, respectively.
Later, the following general matrix will be created after obtaining the constants of Eq. ( 2) 65 : where The model constants are attained over the training step.According to the below criterion, the testing subset is utilized to identify the optimum combination of two independent variables 70 :

Old correlations presented in the previous studies
Two primary theoretical models have been established to approximate the specific heat capacity values of nanofluids.The first one was developed by Pak and Cho 18 in 1998 based on the ideal gases' mixing theory according to Eq. ( 6).Another one was made by Xuan and Roetzle 75 in 2000 on the basis of the thermal equilibrium of nanoparticles and base-fluid according to Eq. (7).
Besides these two models, several other empirical correlations have been generated by various researchers in recent studies.Since the full description of these studies was detailed in the literature review part of the present work, only the developed correlations and their validity conditions are reported here: • Vajjha and Das 17 : where T is in Kelvin and the coefficients A, B, C, and D are quartic polynomial functions of ϕ v .By applying Eq. ( 8), the specific heat capacity of Al 2 O 3 nanoparticles dissolved in a 40:60 (wt%) water-ethylene glycol mixture can be calculated at volume concentrations of 2-10 vol% and a temperature range of 313-363 K.
• Çolak et al. 28 : where T is temperature (°C) and ϕ v is volume fraction (vol%).This correlation, which has been developed for Cu-Al 2 O 3 /W hybrid nanofluid, has constants and error values shown in Table 2.
• Gao et al. 29 : Equation ( 13) has been generated based on experimental data of GrapheneOxide-Al 2 O 3 /W hybrid nanofluids, in which T and ϕ m are temperature (°C) and mass fraction (wt%), respectively.Table 3 lists the constants and the error value related to this equation.

New correlations developed in the present study (a) The GRG-based correlation:
Using the solver tool available in the Data tab of Excel software, various mathematical relationships with different numbers of constants can be created through the implementation of the GRG technique.For this purpose, a correlation in the form of Eq. ( 14) was developed with the aim of minimizing the average absolute percent relative error (AAPRE) regarding input and output variables (for the total dataset).According to the average and mode values related to the nanoparticle size and temperature variables, available in the descriptive statistics presented in Table S2 (in the Supplementary File), the reference values for this equation were considered to be d o = 50 nm and T o = 300 K. (    where

Evaluation and comparison of the correlations
In the current research, four new correlations-Eqs.( 14) to ( 17)-were attained with the aim of estimating the specific heat capacity of mono-nanofluids by executing the GRG, GP, GEP, and GMDH techniques.On the other hand, three previously proposed Eqs. ( 6), (11), and (12) were selected and implemented on the entire employed database.Then, in order to compare the estimated outcomes of all seven equations with their corresponding experimental values and to evaluate the performance and accuracy of those equations, the following statistical parameters for train and test subsets (including 1667 and 417 data-points, respectively) and the total database (with 2084 data-points) were computed and declared according to Tables 4 and 5.As seen in Table 4, Eq. ( 6) can provide relatively good estimations of the specific heat capacity for mononanofluids available in the database.Such a consequence has also been mentioned in the research of Vajjha and Das 17,20 and Murshed 31 , but it contradicts the findings of Zhou and Ni 76 and Zhou et al. 77 .However, Eq. ( 11) has not shown a good performance due to the parameter-related limitations implied in Ref. 24 and the broad range of the variables existing in the current work.In addition, a poor and unacceptable performance is observed from Eq. ( 12) because it was obtained based on Cu-Al 2 O 3 /W hybrid nanofluids, while the database used in the present study only comprises mono-nanofluids.This result is owing to the dissimilarity of stability levels and thermophysical properties found in mono and hybrid nanofluids caused by their structure-related differences.

Average Percent Relative Error (APRE)
If the results of four novel correlations developed in the current study are compared with the estimations of the other three equations, it can be understood that these four correlations have very good accuracy in estimating  www.nature.com/scientificreports/ the specific heat capacity of mono-nanofluids and have achieved acceptable error rates (AAPRE < 3.2% and R 2 > 0.95).This issue is also depicted in Fig. 4 as a graphical analysis of the AAPRE and R 2 parameters.When carefully observed, the best performance is found for correlations obtained using the GMDH and GEP techniques.
Applying graphical error analysis is a suitable and useful tool for checking the performance of predictive models, especially when multiple various models are to be compared side by side.Figure 5 displays percent relative errors related to the estimated values of C p,nf obtained from the GRG, GP, GEP, and GMDH techniques for the train and test subsets versus the experimental values of C p,nf .In this graph, the data-points are scattered  As illustrated in Fig. 5, a large percentage of the experimental data-points are overestimated by all techniques.In other words, a majority of the estimated values are greater than the experimental values, and therefore, the relative errors are negative numbers [see Eq. ( 18)].However, Fig. 5d demonstrates that the GMDH technique overestimates a relatively small fraction of data-points (low to medium values of C p,nf ) and performs a credible task in evenly estimating the other data-points.Although this technique would be suitable for estimating just medium-and large-range values of C p,nf , it still has almost an error tendency and needs to be utilized carefully.
According to Fig. 5d, the relative errors related to the C P,nf values estimated by the GMDH technique are located near the line with zero error (within the error range of −24.6% to +7.5%).This technique has made more accurate and reliable estimates than those of the other techniques.
In order to undertake a more thorough analysis regarding the accuracy and precision of the suggested correlations, another visual evaluation was performed by utilizing a particular kind of scatter chart known as the cross-plot.In this method, the estimated data-points from a correlation are put against the experimental data and across a 45° line (known as a unit-slope line) passing the plot's origin.Hence, the effectiveness of each correlation can be judged by how close the trend is to the 45° line.The cross-plots related to the developed correlations for C P,nf estimation applying the GRG, GP, GEP, and GMDH techniques are presented in Fig. 6.
According to Fig. 6, a fairly packed accumulation of data-points along the unit-slope line is observed, particularly for the GMDH technique, whose outputs are in respectable agreement with the experimental data.Furthermore, the GRG technique has higher data deviations from the unit-slope line than the other techniques.
In Fig. 7, the group-error distribution diagrams in terms of five input variables are exhibited for three old and four novel correlations.This method first involves classifying the input or independent variables into a certain number of ranges (or categories) according to the scope of their changes, after which the estimation error values for the target variable are determined and plotted in each range.
As depicted by Fig. 7a regarding the "average particle size" parameter, the lowest AAPRE values correspond to the range of 52-66 nm for the GRG, GP, and GEP techniques and the range of 10-24 nm for the GMDH technique.As shown in Fig. 7b regarding the "particle volume fraction" parameter, the lowest AAPRE values are achieved in the range of 4-6 vol% for the GRG and GEP techniques and the range of 8-10 vol% for the GP, GEP, and GMDH techniques.As can be seen in Fig. 7c concerning the "temperature" parameter, the lowest AAPRE values are attained in the range of 373.15-413.15K for the GRG and GMDH techniques, the range of 293.15-333.15K for the GP technique, and the range of 333.15-373.15K for the GEP technique.As indicated in Fig. 7d regarding the "nanoparticle specific heat" parameter, the lowest AAPRE values are seen in the range of 0.536-0.692kJ/kg.K (mainly Si 3 N 4 , TiN, CuO, and TiO 2 ) for the GRG technique, the range of 0.692-0.848kJ/ kg.K (mainly Al 2 O 3 and SiO 2 ) for the GEP technique, and the range of 1.004-1.160kJ/kg.K (mainly MgO) for the GP and GMDH techniques.As can be observed in Fig. 7e concerning the "base-fluid specific heat" parameter, the lowest AAPRE values are found for all four techniques of the GRG, GP, GEP, and GMDH in the range of 2.210-2.632kJ/kg.K (mainly EthyleneGlycol).It is worth noting that with a growth in the heat capacity values of the base-fluids, the AAPRE values computed for the correlation belonging to Çolak et al. 28 [i.e., Eq. ( 12)] have decreased.
Figure 8 illustrates a graphical analysis of cumulative frequency for evaluating the performance of the different correlations mentioned in the present study.According to this plot, the cumulative frequency of all data-points is depicted versus the percent relative errors to measure the number of data-points that the correlations can reliably estimate.From Fig. 8, it can be inferred that the GMDH technique surpasses the other techniques by successfully and effectively estimating more than 90% of total data-points with a relative error of −10% to +5%.Furthermore, the benefit and priority of the new correlations developed in this study are clearly visible compared to the other correlations.

Outliers detection and applicability domain of a technique
A popular and effective method known as the "leverage statistical approach" is applied to find probable outliers (or unexpectedly aberrant data) and designate the applicability domain of the developed correlations [78][79][80] .For this purpose, this method generates a graph known as the "Williams plot" by determining two parameters, the Hat Matrix (H) and the Standardized Residuals (SR), utilizing the Eqs.( 23) and ( 24) 80-82 : where the letter T indicates the transpose operator.X is a N t × N i matrix, where N t and N i depict the number of all data-points and technique inputs, respectively.
where RMSE refers to the root mean square error of the used technique, e i denotes the discrepancy of the i-th estimated data-point from its associated experimental value, and the Hat Indices H ii represent the diagonal components of the H matrix (i.e., H ii = diag(H)).
Williams plot visually displays three important zones associated with the existence of valid data, out-of-leverage data, and suspected data (outliers) by revealing the relation between Standardized Residuals and Hat Indices.Valid data-points can be found in the ranges −3 ≤ SR ≤ +3 and 0 ≤ H ≤ H*.The leverage values equal to −3 and +3 for the parameter SR are called the cut-off metrics.The parameter H*, also referred to as the critical leverage or warning leverage, has the value 3(N i + 1)/N t .The data-points that belong to the ranges −3 ≤ SR ≤ +3 and Due to their negative effect on the slope and intercept of the constructed regression line, they are considered a risk to reliable modeling and are also called "bad high leverage" points [78][79][80][81][82][83] .Figure 9 exhibits the Williams plot for the best-proposed technique (i.e., GMDH) based on the total dataset employed in the current study.This graph shows that the GMDH technique detects 1.68% of the 2084 data-points (35 red triangular marks) as probable outliers.Furthermore, just 0.62% of the 2084 data-points (13 blue square marks) are found to be outside the applicability domain of the GMDH technique.As a result, due to the existence of a substantial portion of valid data-points (green circular marks) in the ranges −3 ≤ SR ≤ +3 and 0 ≤ H ≤ 0.0086, the accuracy and reliability of this technique are confirmed again.

Sensitivity analysis
Performing sensitivity analysis can include several purposes, such as detecting errors existing in the model's structure, determining the grade of relation between input and output variables of a model, and refining a model by discarding inputs that have little or no effect on its output.The relevancy factor (r) is a parameter that is used to assess how much every input variable influences the output of a model.This factor's value can vary from −1 to +1, and the greater its absolute value for an input variable, the higher its impact on the model's output 30 .
One of the ways to determine the relevancy factors is to use the Correlation option in the Data Analysis tools available in the Data tab of Excel software.The relevancy factors were calculated for every one of the techniques proposed in the present research, and similar results were discovered.Figure 10 represents the relevancy factors derived for each input variable according to the estimations of all techniques.Obviously, the parameters of temperature (T) as well as specific heat capacities of nanoparticle (C P,np ) and base-fluid (C P,bf ) have a direct relationship with nanofluids' specific heat capacity (C P,nf ), whereas two parameters of average particle size (d np ) and volume fraction (ϕ v ) do not.Additionally, this figure indicates that the two parameters C P,bf and d np (with r factors of about +0.94 and −0.42, respectively) have significant impacts on estimating the target variable C P,nf .This issue was also discovered in the studies of Jamei et al. 84,85 , who investigated the specific heat capacity of various metal-oxide-based and carbon-based nanofluids dispersed in different base-fluids through the application of powerful machine learning techniques.

Analyses of statistical trends
Based on the relevancy factors outlined in the prior section, the inverse influence of average particle size (d np ) and volume fraction (ϕ v ) on the mono-nanofluids' specific heat capacity (C P,nf ) was revealed.In addition, three other variables comprising temperature (T), nanoparticle specific heat capacity (C P,np ), and base-fluid specific heat capacity (C P,bf ) had a direct proportional influence on the target variable C P,nf .The graphical trend analysis shown in Fig. 11 makes this issue quite evident.Here, only some experimental data and their corresponding values estimated by the best technique (i.e., GMDH) are provided.Figure 11a depicts the decreasing trend of mono-nanofluid heat capacity (C P,nf ) with growing nanoparticle size (d np ).This impact was also found in the investigations of Zhang et al. 86 , Angayarkanni et al. 87 , Xiong et al. 88 , Wang et al. 89 , and Novotny et al. 90 .The underlying cause of this matter can be assigned to the fact that smaller nanoparticles may have greater surface-to-volume ratios and atomic thermal vibrational energies 86,88 .However, it was discovered that the nanoparticle size had a negligible effect, according to the study conducted by Żyła et al. 38 on the specific heat capacity of nanofluids comprising AlN, TiN, and Si 3 N 4 dispersed in ethylene glycol.Such a finding was also reported in Satti et al. 's 25 research.
Based on the experimental results of Murshed 31 , Fig. 11b depicts that the values of mono-nanofluid heat capacity (C P,nf ) reduce as the volume fraction of nanoparticles (ϕ v ) increases.Furthermore, Satti et al. 25 found that for small volume fractions (varying from 0.5% to 1.5%), the specific heat values of their examined nanofluids   92 also disclosed that the specific heat capacity of nanofluids containing copper, silver, and aluminum nanoparticles suspended in water or ethylene glycol reduced as the nanoparticles' concentration increased.The reason is that the solid nanoparticles have lower specific heat capacities in comparison with the base-fluids.Therefore, the more nanoparticles are dissolved in a certain amount of base-fluid, the more the specific heat capacity of the resultant nanofluid will decrease 17,93 .Similar outcomes were also attained in this field by Maghrabie et al. 94 , Moldoveanu and Minea 27 , Verma et al. 35 , and Zhou and Ni 76 .
In Fig. 11c, it is demonstrated how the specific heat value of the mono-nanofluids (C P,nf ) varies with temperature (T) in such a way that there is a direct relationship between these two variables.As the temperature of the nanofluid rises, its ability to absorb and store thermal energy (without undergoing a phase transition) and as a result, the specific heat values increase.This behavior has been clearly noted in numerous studies, including Gao et al. 29 , Wole-Osho et al. 39 , Raud et al. 91 , Popa et al. 26 , Angayarkanni et al. 87 , Elias et al. 33 , Barbés et al. 21,22 , Heyhat et al. 95 , and Vajjha and Das 17,19,20 .
Figures 11d and e illustrate the trend of mono-nanofluids' specific heat changes against the specific heat capacities of nanoparticles and base-fluids, respectively.The direct relationship of C P,nf variations with C P,np and C P,bf is presented for some experimental data.According to Fig. 11d, TiN-containing nanofluids have a greater specific heat capacity than those containing Si 3 N 4 (taken from the experimental data of Żyła et al. 38 ) due to the higher specific heat value of TiN nanoparticles, when considering a certain base-fluid (in this case, ethylene glycol) and keeping other parameters constant.Such an effect was also reported in the experimental study of Vijayakumar et al. 37 regarding the difference in specific heat values of CuO/W and Al 2 O 3 /W nanofluids.As seen in Fig. 11e concerning the experimental data of Cabaleiro et al. 23 , since the specific heat value of the water-ethylene glycol mixture is greater than that of the ethylene glycol alone, the specific heat capacity of nanofluids comprising ZnO suspended in the W+EG mixture has been raised.Moreover, Akilu et al. 36 came to a similar conclusion by evaluating the specific heat changes of SiO 2 nanoparticles dissolved in three different base-fluids (ethylene glycol, glycerol, and ethylene glycol+glycerol).According to Fig. 11e, the base-fluid type has a considerable effect on the specific heat capacity of a nanofluid.As previously declared, the relevancy factor corresponding to the variable C p,bf had the highest value (+0.94).

Conclusions
Four soft-computing techniques were implemented on a database including 2084 data-points taken from 19 experimental studies in order to estimate the specific heat capacity of diverse mono-nanofluids (C P,nf ) as a function of five input variables (d np , φ v , T, C P,np , and C P,bf ).Then, four distinct high-accuracy correlations were developed, and numerous statistical and graphical error analyses were also conducted.The performance of these novel correlations was compared to the outcomes of three earlier correlations published by other researchers.The main advantage of these novel empirical correlations is that they have significantly reduced the constraints of prior correlations.As a result, they are adaptable to various oxide-based mono-nanofluids for a broad range of independent variable values.
According to the outcomes of each innovative and robust correlation developed in this research, the utilized techniques can be rated based on their accuracy as GMDH > GEP > GP > GRG.The results revealed that the correlation achieved using the GMDH technique is the most effective and appropriate one (with AAPRE = 2.4163% and R 2 = 0.9743).However, the GEP technique has also exhibited acceptable behavior that is comparable.By applying the leverage statistical approach for the GMDH technique and distinguishing valid data, out-of-leverage data, and suspected data (outliers), the reliability and accuracy of this technique were confirmed.To recognize the relation between input and output variables available in the entire dataset, a sensitivity analysis was performed and the relevancy factors were determined for the best technique (GMDH).It was found that average nanoparticle size and base-fluid specific heat capacity are significant and influencing factors in establishing the specific heat values of mono-nanofluids.Ultimately, the variation trend of the target variable (C P,nf ) for each of the input variables was graphically evaluated using the experimental data obtained from the previous studies and the estimated data generated by the GMDH technique.The findings of this study agree

Figure 1 .
Figure 1.A flowchart of the GP technique.

Figure 2 .
Figure 2. A flowchart of the GEP technique.

Figure 3 .
Figure 3.A flowchart of the GMDH technique.

Figure 4 .
Figure 4. Graphical comparison of AAPRE and R 2 values for the introduced correlations.
https://doi.org/10.1038/s41598-023-47327-xrecognized as "good high leverage" points, which are outside the applicability domain of the technique by which a correlation has been generated.The rest of the data-points with SR < −3 or SR > +3 (independent of H values) are classified as suspected data or outliers.

Figure 7 .
Figure 7. AAPRE distribution in different ranges of input variables for the suggested correlations.

Figure 8 .
Figure 8. Cumulative frequency against percent relative error for suggested correlations.

Figure 9 .
Figure 9. Williams plot related to the correlation developed by the GMDH technique.

Figure 10 .
Figure 10.Relevancy factors between input and output variables.

Figure 11 .
Figure 11.Variations of mono-nanofluids' specific heat capacity with any input parameters.

Table 4 .
The statistical parameters of old correlations presented in the previous studies.

Table 5 .
The statistical parameters of new correlations developed in the current study.